Using Bilingual Dependencies to Align Words in English/French Parallel Corpora
نویسنده
چکیده
This paper describes a word and phrase alignment approach based on a dependency analysis of French/English parallel corpora, referred to as alignment by “syntax-based propagation.” Both corpora are analysed with a deep and robust dependency parser. Starting with an anchor pair consisting of two words that are translations of one another within aligned sentences, the alignment link is propagated to syntactically connected words.
منابع مشابه
- 1 - A Program for Aligning Sentences in Bilingual Corpora
Researchers in both machine translation (e.g., Brown et al., 1990) and bilingual lexicography (e.g., Klavans and Tzoukermann, 1990) have recently become interested in studying bilingual corpora, bodies of text such as the Canadian Hansards (parliamentary proceedings) which are available in multiple languages (such as French and English). One useful step is to align the sentences, that is, to id...
متن کاملIdentifying Correspondences Between Words: An Approach Based On A Bilingual Syntactic Analysis Of French/English Parallel Corpora
We present a word alignment procedure based on a syntactic dependency analysis of French/English parallel corpora called “alignment by syntactic propagation”. Both corpora are analysed with a deep and robust parser. Starting with an anchor pair consisting of two words which are potential translations of one another within aligned sentences, the alignment link is propagated to the syntactically ...
متن کاملA Statistical View on Bilingual Lexicon Extraction: From Parallel Corpora to Non-parallel Corpora
We present two problems for statistically extracting bilingual lexicon: (1) How can noisy parallel corpora be used? (2) How can non-parallel yet comparable corpora be used? We describe our own work and contribution in relaxing the constraint of using only clean parallel corpora. DKvec is a method for extracting bilingual lexicons, from noisy parallel corpora based on arrival distances of words ...
متن کاملAnchor points for bilingual lexicon extraction from small comparable corpora
We examine the contribution of reliable elements in French– and English–Japanese alignment from comparable corpora, using transliterated elements and scientific compounds as anchor points among context-vectors of elements to align. We highlight those elements in context-vector normalisation to give them a higher priority in context-vector comparison. We carry out experiments on small comparable...
متن کاملCreating a Reusable English-Chinese Parallel Corpus for Bilingual Dictionary Construction
This paper first describes an experiment to construct an English-Chinese parallel corpus, then applying the Uplug word alignment tool on the corpus and finally produce and evaluate an English-Chinese word list. The Stockholm English-Chinese Parallel Corpus (SEC) was created by downloading English-Chinese parallel corpora from a Chinese web site containing law texts that have been manually trans...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005